Skip to content

fix(alerts): resolve fingerprint dedup causing alert history loss#25

Merged
KKamJi98 merged 3 commits intomainfrom
fix/alert-fingerprint-dedup
Mar 8, 2026
Merged

fix(alerts): resolve fingerprint dedup causing alert history loss#25
KKamJi98 merged 3 commits intomainfrom
fix/alert-fingerprint-dedup

Conversation

@KKamJi98
Copy link
Contributor

@KKamJi98 KKamJi98 commented Mar 2, 2026

Summary

  • 문제: Alertmanager가 resolved 후 동일 labels로 재 firing 시 같은 fingerprint를 생성하여, ON CONFLICT (alert_id) UPSERT가 기존 레코드를 덮어씀 → alert 히스토리 유실, 분석 결과 덮어쓰기
  • 수정: alert_idALR-{uuid} 형식으로 분리하고, fingerprint는 그룹핑 키로 유지. 동일 fingerprint + firing 중인 alert만 UPDATE, 그 외는 새 레코드 INSERT
  • 추가 수정: SaveAlert 원자적 COALESCE 쿼리로 race condition 방지, resolved_at WHERE 불일치 수정, ON CONFLICT 시 labels/severity 등 갱신 누락 수정

Test plan

  • go test ./... 전체 PASS (15개 신규 테스트 포함)
  • 신규 firing → ALR-xxx 생성 확인
  • 동일 fingerprint 반복 firing → 기존 ALR-xxx UPDATE
  • resolved 후 재 firing → 새 ALR-yyy 생성 확인
  • Flapping 감지 정상 동작

Alertmanager reuses fingerprints for re-fired alerts, causing resolved-then-re-fired
alerts to overwrite existing records. Separate alert_id (ALR-UUID) from fingerprint
(grouping key) so each occurrence gets its own record.

- alert_id now uses ALR-{uuid[:8]} format, fingerprint kept as grouping key
- Atomic COALESCE subquery in SaveAlert to prevent TOCTOU race conditions
- Partial unique index ensures one firing alert per fingerprint
- Fix resolved_at never being set (WHERE clause mismatch)
- Extract alertStore/alertSlacker/alertAnalyzer interfaces for testability
- Add 15 unit tests covering dedup scenarios
@KKamJi98 KKamJi98 force-pushed the fix/alert-fingerprint-dedup branch from 8e54219 to 5cf2e21 Compare March 2, 2026 16:23
@KKamJi98
Copy link
Contributor Author

KKamJi98 commented Mar 8, 2026

Code review

Found 1 issue:

  1. RecordStateTransition에서 alert_id 컬럼에 fingerprint 값이 삽입됩니다. 이 PR의 핵심 설계 변경은 alert_id (ALR-{uuid})와 fingerprint를 분리하는 것인데, alert_state_transitions 테이블에서는 VALUES ($1, $1, ...) 로 동일한 fingerprint 값이 양쪽 컬럼에 들어갑니다. 실제 alertID를 파라미터로 추가하거나, 컬럼 의미를 명시하는 수정이 필요합니다.

// RecordStateTransition - Alert 상태 전환 기록 (fingerprint 기반)
func (db *Postgres) RecordStateTransition(fingerprint, fromStatus, toStatus string, timestamp time.Time) error {
query := `
INSERT INTO alert_state_transitions (alert_id, fingerprint, from_status, to_status, transitioned_at)
VALUES ($1, $1, $2, $3, $4)
`
_, err := db.Pool.Exec(context.Background(), query, fingerprint, fromStatus, toStatus, timestamp)
return err
}

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

- AlertService struct: 인터페이스 추상화(alertStore, alertAnalyzer) 유지 + appSettings/envFlapping 반영
- alert_test.go: flappingConfig → envFlapping 필드명 동기화
- app_settings.go: json.RawMessage swaggertype 어노테이션 추가 (swag 파싱 호환)
@KKamJi98 KKamJi98 merged commit d1fc66e into main Mar 8, 2026
6 checks passed
@KKamJi98 KKamJi98 deleted the fix/alert-fingerprint-dedup branch March 9, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant